37 research outputs found

    A Taxonomy of Prompt Modifiers for Text-To-Image Generation

    Full text link
    Text-to-image generation has seen an explosion of interest since 2021. Today, beautiful and intriguing digital images and artworks can be synthesized from textual inputs ("prompts") with deep generative models. Online communities around text-to-image generation and AI generated art have quickly emerged. This paper identifies six types of prompt modifiers used by practitioners in the online community based on a 3-month ethnographic study. The novel taxonomy of prompt modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image generation, but may also help practitioners of AI generated art improve their images. We further outline how prompt modifiers are applied in the practice of "prompt engineering." We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction (HCI). The paper concludes with a discussion of broader implications of prompt engineering from the perspective of Human-AI Interaction (HAI) in future applications beyond the use case of text-to-image generation and AI generated art.Comment: 15 page

    Prompting AI Art: An Investigation into the Creative Skill of Prompt Engineering

    Full text link
    Humankind is entering a novel era of creativity - an era in which anybody can synthesize digital content. The paradigm under which this revolution takes place is prompt-based learning (or in-context learning). This paradigm has found fruitful application in text-to-image generation where it is being used to synthesize digital images from zero-shot text prompts in natural language for the purpose of creating AI art. This activity is referred to as prompt engineering - the practice of iteratively crafting prompts to generate and improve images. In this paper, we investigate prompt engineering as a novel creative skill for creating prompt-based art. In three studies with participants recruited from a crowdsourcing platform, we explore whether untrained participants could 1) recognize the quality of prompts, 2) write prompts, and 3) improve their prompts. Our results indicate that participants could assess the quality of prompts and respective images. This ability increased with the participants' experience and interest in art. Participants further were able to write prompts in rich descriptive language. However, even though participants were specifically instructed to generate artworks, participants' prompts were missing the specific vocabulary needed to apply a certain style to the generated images. Our results suggest that prompt engineering is a learned skill that requires expertise and practice. Based on our findings and experience with running our studies with participants recruited from a crowdsourcing platform, we provide ten recommendations for conducting experimental research on text-to-image generation and prompt engineering with a paid crowd. Our studies offer a deeper understanding of prompt engineering thereby opening up avenues for research on the future of prompt engineering. We conclude by speculating on four possible futures of prompt engineering.Comment: 29 pages, 10 figure

    Perceptions and Realities of Text-to-Image Generation

    Full text link
    Generative artificial intelligence (AI) is a widely popular technology that will have a profound impact on society and individuals. Less than a decade ago, it was thought that creative work would be among the last to be automated - yet today, we see AI encroaching on many creative domains. In this paper, we present the findings of a survey study on people's perceptions of text-to-image generation. We touch on participants' technical understanding of the emerging technology, their fears and concerns, and thoughts about risks and dangers of text-to-image generation to the individual and society. We find that while participants were aware of the risks and dangers associated with the technology, only few participants considered the technology to be a personal risk. The risks for others were more easy to recognize for participants. Artists were particularly seen at risk. Interestingly, participants who had tried the technology rated its future importance lower than those who had not tried it. This result shows that many people are still oblivious of the potential personal risks of generative artificial intelligence and the impending societal changes associated with this technology.Comment: ACM Academic Mindtrek 202

    Text-to-Image Generation: Perceptions and Realities

    Full text link
    Generative AI is an emerging technology that will have a profound impact on society and individuals. Only a decade ago, it was thought that creative work would be among the last to be automated - yet today, we see AI encroaching on creative domains. In this paper, we present the key findings of a survey study on people's perceptions of text-to-image generation. We touch on participants' technical understanding of the emerging technology, their ideas for potential application areas, as well as concerns, risks, and dangers of text-to-image generation to society and the individual. The study found that participants were aware of the risks and dangers associated with the technology, but only few participants considered the technology to be a risk to themselves. Additionally, those who had tried the technology rated its future importance lower than those who had not.Comment: Accepted at Generative AI in HCI workshop, CHI '2

    DDB-EDM to FaBiO: The Case of the German Digital Library

    Get PDF
    Cultural heritage portals have the goal of providing users with seamless access to all their resources. This paper introduces initial efforts for a user-oriented restructuring of the German Digital Library (DDB). At present, cultural heritage objects (CHOs) in the DDB are modeled using an extended version of the Europeana Data Model (DDBEDM), which negatively impacts usability and exploration. These challenges can be addressed by leveraging ontologies, and building a knowledge graph from the DDB's voluminous collection. Towards this goal, an alignment of bibliographic metadata from DDB-EDM to FRBR-Aligned Bibliographic Ontology (FaBiO) is presented

    DDB-KG: The German Bibliographic Heritage in a Knowledge Graph

    Get PDF
    Under the German government’s initiative “NEUSTART Kultur”, the German Digital Library or Deutsche Digitale Bibliothek (DDB) is undergoing improvements to enhance user-experience. As an initial step, emphasis is placed on creating a knowledge graph from the bibliographic record collection of the DDB. This paper discusses the challenges facing the DDB in terms of retrieval and the solutions in addressing them. In particular, limitations of the current data model or ontology to represent bibliographic metadata is analyzed through concrete examples. This study presents the complete ontological mapping from DDB-Europeana Data Model (DDB-EDM) to FaBiO, and a prototype of the DDB-KG made available as a SPARQL endpoint. The suitabiliy of the target ontology is demonstrated with SPARQL queries formulated from competency question

    2VT: Visions, Technologies, and Visions of Technologies for Understanding Human Scale Spaces

    Get PDF
    Spatial experience is an important subject in various fields, and in HCI it has been mostly investigated in the urban scale. Research on human scale spaces has focused mostly on the personal meaning or aesthetic and embodied experiences in the space. Further, spatial experience is increasingly topical in envisioning how to build and interact with technologies in our everyday lived environments, particularly in so-called smart cities. This workshop brings researchers and practitioners from diverse fields to collaboratively discover new ways to understand and capture human scale spatial experience and envision its implications to future technological and creative developments in our habitats. Using a speculative design approach, we sketch concrete solutions that could help to better capture critical features of human scale spaces and allow for unique possibilities for aspects such as urban play. As a result, we hope to contribute a road map for future HCI research on human scale spatial experience and its application

    Crowd-powered creativity support systems

    No full text
    Abstract Crowdsourcing has great potential in supporting humans to be more creative. This doctoral dissertation explores crowd-powered creativity support systems and covers a research arc from the fundamental prerequisites of leveraging crowds for creativity support to an accompanying set of case studies to clarify how complex creative work can be supported in practice

    Crowdsourcing creative work

    No full text
    Abstract Creative work is launched on paid crowdsourcing platforms, yet we lack an in-depth understanding of how the two key stakeholders of crowdsourcing platforms (crowd workers and requesters) perceive and experience creative work. Creativity is a human characteristic that is difficult to automate by machines, and supplying requesters with crowdsourced human insights and complex creative work is, therefore, a timely topic for research. According to value-sensitive design, the integration of human insight into complex socio-technical systems will need to consider the perspectives of the two key stakeholders. This article-based doctoral thesis explores the stakeholder perspectives and experiences of crowdsourced creative work on two of the leading crowdsourcing platforms. The thesis has two parts. In the first part, we explore creative work from the perspective of the crowd worker. In the second part, we explore and study the requester’s perspective in different contexts and several case studies. The research is exploratory and we contribute empirical insights using survey-based and artefact-based approaches common in the field of Human-Computer Interaction (HCI). In the former approach, we explore the key issues that may limit creative work on paid crowdsourcing platforms. In the latter approach, we create computational artefacts to elicit authentic experiences from both crowd workers and requesters of crowdsourced creative work. The thesis contributes a classification of crowd workers into five archetypal profiles, based on the crowd workers’ demographics, disposition, and preferences for creative work. We propose a three-part classification of creative work on crowdsourcing platforms: creative tasks, creativity tests, and creativity judgements (also referred to as creative feedback). The thesis further investigates the emerging research topic of how requesters can be supported in interpreting and evaluating complex creative work. Last, we discuss the design implications for research and practice and contribute a vision of creative work on future crowdsourcing platforms with the aim of empowering crowd workers and fostering an ecosystem around tailored platforms for creative microwork.Tiivistelmä Luovassa työssä käytetään maksullisia joukkoistusalustoja, mutta meiltä puuttuu kuitenkin vielä syvällinen käsitys siitä, miten kaksi avainasemassa olevaa joukkoistusalustojen sidosryhmää (joukkotyöntekijät ja toimeksiantajat) ymmärtävät ja kokevat luovan työn. Luovuus on ihmisen ominaisuus, jota on vaikea automatisoida, ja joukkoistettujen inhimillisten näkemysten ja kompleksisen luovan työn välittäminen toimeksiantajille ovat siitä syystä ajankohtainen tutkimuskohde. Arvosensitiivisen suunnittelun mukaan inhimillisen ymmärryksen integroinnissa kompleksisiin sosioteknisiin järjestelmiin on otettava huomioon kahden avainasemassa olevan sidosryhmän näkökulmat. Tässä artikkeliväitöskirjassa tutkitaan sidosryhmien näkökulmia ja kokemuksia joukkoistetusta luovasta työstä kahdella joukkoistusalustalla. Väitöskirja koostuu kahdesta osasta. Ensimmäisessä osassa tarkastellaan luovaa työtä joukkotyöntekijän näkökulmasta. Toisessa osassa tarkastellaan toimeksiantajan näkökulmaa useissa tapaustutkimuksissa. Tällä tutkimuksella halutaan syventää empiiristä ymmärrystä hyödyntämällä kyselytutkimuksiin perustuvia lähestymistapoja ja ihmisen ja tietokoneen välisessä vuorovaikutuksessa (Human-Computer Interaction, HCI) yleisiä lähestymistapoja. Ensimmäisessä lähestymistavassa tarkastellaan keskeisiä seikkoja, jotka voivat rajoittaa maksullisilla joukkoistusalustoilla tehtävää luovaa työtä. Jälkimmäisessä lähestymistavassa luodaan laskennallisia artefakteja, joilla halutaan tuoda esiin joukkotyöntekijöiden ja joukkoistetun luovan työn toimeksiantajien aitoja kokemuksia. Väitöskirjassa joukkotyöntekijät luokitellaan viiteen arkkityyppiprofiiliin, jotka perustuvat joukkotyöntekijöiden demografisiin tietoihin, ajattelumalleihin ja luovaa työtä koskeviin mieltymyksiin. Väitöskirjassa ehdotetaan kolmiosaista luokittelua joukkoistusalustoilla tehtävälle luovalle työlle: luovuutta edellyttävät tehtävät, luovuustestit ja luovuuden arvioinnit (joita kutsutaan myös luovuuspalautteeksi). Lisäksi väitöskirjassa tutkitaan uutta tutkimusaihetta eli sitä, miten toimeksiantajia voidaan tukea monimutkaisen luovan työn tulkinnassa ja arvioinnissa. Lopuksi tutkimuksessa tarkastellaan mallin vaikutusta tutkimukseen ja käytäntöön, ja siinä esitetään tulevaisuuden joukkoistusalustoilla tehtävästä luovasta työstä visio, jonka päämääränä on parantaa joukkotyöntekijöiden valmiuksia ja tukea luovan mikrotyön räätälöityjen ympäristöjen ympärille rakentuvaa ekosysteemiä
    corecore